Skip to content

Filter by sample#66

Merged
jamescasbon merged 25 commits intojamescasbon:masterfrom
lennax:lenna
Feb 23, 2014
Merged

Filter by sample#66
jamescasbon merged 25 commits intojamescasbon:masterfrom
lennax:lenna

Conversation

@lennax
Copy link

@lennax lennax commented Jul 9, 2012

  • New class vcf.SampleFilter modifies Reader to filter each row's samples as they are being read
  • Class destructor removes monkey patch but modified Reader does work normally
  • Class can be used as a module or via the command line interface vcf_sample_filter.py
  • Samples to filter can be specified by name or index
  • Specified samples are filtered by default but can be kept by specifying "invert"
  • Filter can write to any writable object (stdout, specified outfile, etc)
  • Errors and status are given with warnings and logging to allow customization

@benjeffery
Copy link

Hi,
I only came across this pull after I needed a sample filter and wrote a quick one myself (albeit without CLI integration). Am I right in thinking that this code still parses all the samples even if they are to be filtered? I have a VCF with 1600 samples so parsing them all is very costly and the main point of the filter was to prevent this. I only pass the wanted samples to the "_parse_samples" call. You can see my diff at https://github.com/benjeffery/PyVCF/compare/optimise
Thanks!

@lennax
Copy link
Author

lennax commented Sep 5, 2012

My goal was to not modify the source code of the Reader class at all. My monkey patch/decorator intercepts the sample parameter to _parse_samples and removes the undesired samples, so it doesn't fully parse each sample. It might be worth doing some profiling to see if there's a significant difference in performance.

@janedanes
Copy link

Hi

I'm a beginner to Python and I'm trying to understand your code and how to use it to remove certain samples. If I wanted to remove a sample from the example file ['NA00001'] from the vcf file what is the code I would use?

@jamescasbon
Copy link
Owner

People want this. We should definitely rebase it!

@lennax
Copy link
Author

lennax commented Feb 10, 2014

I'm a little overwhelmed at the moment but I will set aside time to rebase this within the next two weeks.

@jamescasbon
Copy link
Owner

Thanks, Lenna!

On 10 February 2014 18:33, Lenna Peterson notifications@github.com wrote:

I'm a little overwhelmed at the moment but I will set aside time to rebase
this within the next two weeks.

Reply to this email directly or view it on GitHubhttps://github.com//pull/66#issuecomment-34664948
.

James
http://casbon.me/

Conflicts:
	vcf/parser.py
	vcf/test/test_vcf.py
@lennax
Copy link
Author

lennax commented Feb 22, 2014

I've merged master back into this and it passes the tests.

@jamescasbon
Copy link
Owner

Wow, thank you so much Lenna - lots of people have requested this. Sorry for letting it sit here so long.

jamescasbon added a commit that referenced this pull request Feb 23, 2014
@jamescasbon jamescasbon merged commit 45513dd into jamescasbon:master Feb 23, 2014
@lennax lennax deleted the lenna branch February 23, 2014 21:05
gotgenes pushed a commit to gotgenes/PyVCF that referenced this pull request May 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants